-
Notifications
You must be signed in to change notification settings - Fork 176
Cythonize away some perf hot spots #709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
/ok to test 227d9c1 |
|
/ok to test 48de1b3 |
/ok to test |
@oleksandr-pavlyk, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/ |
/ok to test 0d573fc |
To fix test failures with CTK 11.8 and driver 535.247.01 only attempt to query _ctx_handle if _device_id is None. Ensure that context handle is set in Stream.context property
0d573fc
to
30de720
Compare
/ok to test 30de720 |
PR description updated. Thanks to @oleksandr-pavlyk, the CI failure was identified and fixed (the refactoring introduced a call to |
/ok to test e95c4b1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
/ok to test f4531e5 |
self._mnff = Event._MembersNeededForFinalize(self, None) | ||
|
||
options = check_or_create_options(EventOptions, options, "Event options") | ||
def _init(cls, device_id: int, ctx_handle: Context, options=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Event
contains native class members, perhaps adding __cinit__
to initialize them is appropriate. Something like
def __cinit__(self):
self._timing_disabled = False
self._busy_waited = False
self._device_id = -1
I also think it would be safe to set object
class members to None
.
This would ensure that Event.__new__(Event)
would return an initialized struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Cython sets everything to None for us, but it'd be good to verify this indeed
Cython additionally takes responsibility of setting all object attributes to None,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let's leave object
members out. Should I push adding Event.__cinit__
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the same section says all members are zero/null initialized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but is it appropriate to zero initialize _device_id
? Perhaps it does not matter much.
CI is green |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Description
Less aggressive version of #677.
Based on the summary in #658 (comment), this PR offers a performance optimization over identified hotspots to bring us to a lot closer with our reference (CuPy). The optimization strategy is to
cuda.bindings
Python APIs in the Cython code, so as to avoid introducing CTK as a build-time dependency (and therefore having to ship two separate packagescuda-core-cu11
andcuda-core-cu12
)In other words, this PR tries to find a reasonable balance between performance, easy of development, and easy of deployment, without introducing any breaking change.
Preliminary data:
Checklist